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Pseudozyma aphidis is an efficient producer of mannosylerythritol lipids exceeding concentrations of > 100 g/liter from renew- 
able feed stocks. Additionally, a biosurfactant cellobiose lipid is also secreted during nitrogen limitation. Here, we describe the 
sequencing of P. aphidis to unravel the genomic basis of biosurfactant metabolism in P. aphidis. 
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Mannosylerythritol lipids (MEL) belong to the most promis- 
ing microbial biosurfactants and are secreted by fungi of the 
genera Pseudozyma and Ustilago, of which Pseudozyma aphidis 
facilitates product concentrations of up to 165 g/liter. This species 
secretes a mixture of the MEL-A, -B, -C, and -D, which share a 
common sugar group, two fatty acid residues of medium chain 
length, and different numbers of acetyl groups. In Ustilago maydis, 
the gene cluster for MEL biosynthesis encodes the glycosyltrans- 
ferase Emtlp, two acyltransferases Maclp and Mac2p, as well as a 
transporter protein Mmflp and the acetyltransferase Matlp (1). A 
homologous cluster was found in Pseudozyma antarctica T-34 (2) 
and Pseudozyma hubeiensis (3), indicative of a conserved biosur- 
factant metabolism. However, P. antarctica T-34 and P. aphidis 
revealed significant differences in substrate-dependent induction 
of MEL synthesis compared to that of U. maydis (4). Beyond that, 
P. aphidis secretes an additional cellobiose glycolipid. 

Here, we describe the draft genome sequence of the MEL- 
producing species P. aphidis DSM70725. For this purpose, we se- 
quenced the corresponding genomic DNA to approximately 90-fold 
coverage using the Ulumina platform (HiSeq 2000), comprising a 
total amount of 35,141,960 reads, each 50 nucleotides in length. In 
addition, we also sequenced a paired-end cDNA library of the P. aphi- 
dis transcriptome comprising 48,195,420 read pairs with 2 X 95 nu- 
cleotides in length and approximately 300 nucleotides insert size 
(HiSeq 2000). Running the Velvet short-read assembler (5) using the 
genomic fragments generated an initial assembly of 2,160 contigs. 
Expanding these contigs by SSPACE (6) and the additional reads 
from cDNA sequencing reduced the total number to 1,968 contigs, 
resulting in 17.92 Mb for the whole genome of P. aphidis (longest 
contig, 78.1 kb; shortest contig, 1.05 kb; N 50 , 14.7 kb), revealing a 
G+C content of 61.2%. In a next step, we blasted (BLASTn) these 
nucleotide contigs against the genome of the closely related spe- 
cies P. antarctica T34. The total alignment length of the top BLAST 
hits was 14.1 Mb for 1,950 contigs, with an average identity of 
97.63%. This alignment permitted an assignment to 24 supercon- 
tigs, whereas 18 remaining contigs could not be aligned. For the 
annotation of newly in silico -predicted genes, we applied Augustus 



(7), using U. maydis as a reference species. Accordingly, we de- 
tected 6,011 potential complete protein-coding sequences, with 
an average length of 1,875 nucleotides (longest coding sequence 
[CDS], 16,854 nucleotides [nt]; shortest CDS, 198 nt). Searching 
this open reading frame (ORF) collection for homologs in the nr 
database of NCBI revealed 5,589 hits, with an average identity of 
74.51% of the top BLAST hits (BLASTp E value, < IE - 10). Strik- 
ingly, we identified the complete MEL biosynthesis gene cluster at 
the beginning of supercontig 20, which included EMT1, MAC1, 
MAC2, MMF1 , and MAT1 . All five relevant MEL genes are signif- 
icantly conserved between P. antarctica T34 and P. aphidis, with 
similarities of 89.4%, 86.8%, 91.2%, 87%, and 86.8% at the nucle- 
otide level, respectively. These results indicate that a similar MEL 
pathway exists in P. aphidis as was already shown for P. antarctica 
and U. maydis (1, 2, 8). 

Nucleotide sequence accession numbers. This whole-genome 
shotgun project has been deposited at DDBJ/EMBL/GenBank un- 
der the accession no. AWNI00000000. The version described in 
this paper is version AWNI0 1000000. 
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